Using EM for Reinforcement Learning

نویسندگان

  • Peter Dayan
  • Geoffrey E Hinton
چکیده

We discsus Hinton’s (1989) relative payoff procedure (RPP), a static reinforcement learning algorithm whose foundation is not stochastic gradient ascent. We show circumstances under which applying the RPP is guaranteed to increase the mean return, even though it can make large changes in the values of the parameters. The proof is based on a mapping between the RPP and a form of the expectation-maximisation procedure of Dempster, Laird & Rubin (1976).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

EM for Perceptual Coding and Reinforcement Learning Tasks

The paper presents an algorithm for an EM-based reinforcement-driven clustering. As shown here it is applicable to the reinforcement learning setting with continuous state/discrete action space. E-step of the algorithm computes the posterior given the data and the reinforcement. Although designed to discover intrinsic states, the algorithm performs action selection without explicit state identi...

متن کامل

Role of the Primate Ventral Tegmental Area in Reinforcement and Motivation

Monkey electrophysiology suggests that the activity of the ventral tegmental area (VTA) helps regulate reinforcement learning and motivated behavior, in part by broadcasting prediction error signals throughout the reward system. However, electrophysiological studies do not allow causal inferences regarding the activity of VTA neurons with respect to these processes because they require artifici...

متن کامل

Cooperative Multi Robot Path Planning using uncertainty Covariance as a feature EECS-545 Final Report

The project was aimed at applying some of the machine learning tools to the problem of multi-robot path planning. We have designed a Reinforcement Learning(RL) based multi agent planner, which maximizes the information gained as well keep the robots well localized. We have modified a 2D laser scan matcher to recover a multimodel distribution using the Expectation Maximization (EM) algorithm to ...

متن کامل

Low-Area/Low-Power CMOS Op-Amps Design Based on Total Optimality Index Using Reinforcement Learning Approach

This paper presents the application of reinforcement learning in automatic analog IC design. In this work, the Multi-Objective approach by Learning Automata is evaluated for accommodating required functionalities and performance specifications considering optimal minimizing of MOSFETs area and power consumption for two famous CMOS op-amps. The results show the ability of the proposed method to ...

متن کامل

An Adaptive Learning Game for Autistic Children using Reinforcement Learning and Fuzzy Logic

This paper, presents an adapted serious game for rating social ability in children with autism spectrum disorder (ASD). The required measurements are obtained by challenges of the proposed serious game. The proposed serious game uses reinforcement learning concepts for being adaptive. It is based on fuzzy logic to evaluate the social ability level of the children with ASD. The game adapts itsel...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997